21 research outputs found
Recommended from our members
When Can Nonconvex Optimization Problems be Solved with Gradient Descent? A Few Case Studies
Gradient descent and related algorithms are ubiquitously used to solve optimization problems arising in machine learning and signal processing. In many cases, these problems are nonconvex yet such simple algorithms are still effective. In an attempt to better understand this phenomenon, we study a number of nonconvex problems, proving that they can be solved efficiently with gradient descent. We will consider complete, orthogonal dictionary learning, and present a geometric analysis allowing us to obtain efficient convergence rate for gradient descent that hold with high probability. We also show that similar geometric structure is present in other nonconvex problems such as generalized phase retrieval.
Turning next to neural networks, we will also calculate conditions on certain classes of networks under which signals and gradients propagate through the network in a stable manner during the initial stages of training. Initialization schemes derived using these calculations allow training recurrent networks on long sequence tasks, and in the case of networks with low precision activation functions they make explicit a tradeoff between the reduction in precision and the maximal depth of a model that can be trained with gradient descent.
We finally consider manifold classification with a deep feed-forward neural network, for a particularly simple configuration of the manifolds. We provide an end-to-end analysis of the training process, proving that under certain conditions on the architectural hyperparameters of the network, it can successfully classify any point on the manifolds with high probability given a sufficient number of independent samples from the manifold, in a timely manner. Our analysis relates the depth and width of the network to its fitting capacity and statistical regularity respectively in early stages of training
On quantum backpropagation, information reuse, and cheating measurement collapse
The success of modern deep learning hinges on the ability to train neural
networks at scale. Through clever reuse of intermediate information,
backpropagation facilitates training through gradient computation at a total
cost roughly proportional to running the function, rather than incurring an
additional factor proportional to the number of parameters - which can now be
in the trillions. Naively, one expects that quantum measurement collapse
entirely rules out the reuse of quantum information as in backpropagation. But
recent developments in shadow tomography, which assumes access to multiple
copies of a quantum state, have challenged that notion. Here, we investigate
whether parameterized quantum models can train as efficiently as classical
neural networks. We show that achieving backpropagation scaling is impossible
without access to multiple copies of a state. With this added ability, we
introduce an algorithm with foundations in shadow tomography that matches
backpropagation scaling in quantum resources while reducing classical auxiliary
computational costs to open problems in shadow tomography. These results
highlight the nuance of reusing quantum information for practical purposes and
clarify the unique difficulties in training large quantum models, which could
alter the course of quantum machine learning.Comment: 29 pages, 2 figure
Suppressing quantum errors by scaling a surface code logical qubit
Practical quantum computing will require error rates that are well below what
is achievable with physical qubits. Quantum error correction offers a path to
algorithmically-relevant error rates by encoding logical qubits within many
physical qubits, where increasing the number of physical qubits enhances
protection against physical errors. However, introducing more qubits also
increases the number of error sources, so the density of errors must be
sufficiently low in order for logical performance to improve with increasing
code size. Here, we report the measurement of logical qubit performance scaling
across multiple code sizes, and demonstrate that our system of superconducting
qubits has sufficient performance to overcome the additional errors from
increasing qubit number. We find our distance-5 surface code logical qubit
modestly outperforms an ensemble of distance-3 logical qubits on average, both
in terms of logical error probability over 25 cycles and logical error per
cycle ( compared to ). To investigate
damaging, low-probability error sources, we run a distance-25 repetition code
and observe a logical error per round floor set by a single
high-energy event ( when excluding this event). We are able
to accurately model our experiment, and from this model we can extract error
budgets that highlight the biggest challenges for future systems. These results
mark the first experimental demonstration where quantum error correction begins
to improve performance with increasing qubit number, illuminating the path to
reaching the logical error rates required for computation.Comment: Main text: 6 pages, 4 figures. v2: Update author list, references,
Fig. S12, Table I